Detecting Comment Spam through Content Analysis

نویسندگان

  • Congrui Huang
  • Qiancheng Jiang
  • Yan Zhang
چکیده

In the Web 2.0 eras, the individual Internet users can also act as information providers, releasing information or making comments conveniently. However, some participants may spread irresponsible remarks or express irrelevant comments for commercial interests. This kind of socalled comment spam severely hurts the information quality. This paper tries to automatically detect comment spam through content analysis, using some previously-undescribed features. Experiments on a real data set show that our combined heuristics can correctly identify comment spam with high precision(90.4%) and recall(84.5%).

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Poster: Effort-based Detection of Comment Spammers

Social media has become ubiquitous and important for content sharing. A typical example of how users contribute content to a social media platform is through comment threads in online articles. Unfortunately, there is an increasing prevalence of malicious activity in these threads by spammers through comment messages. The existing approaches tackling comment spam are comment-level in that they ...

متن کامل

NEIGHBORWATCHER: A Content-Agnostic Comment Spam Inference System

Comment spam has become a popular means for spammers to attract direct visits to target websites, or to manipulate search ranks of the target websites. Through posting a small number of spam messages on each victim website (e.g., normal websites such as forums, wikis, guestbooks, and blogs, which we term as spam harbors in this paper) but spamming on a large variety of harbors, spammers can not...

متن کامل

Detecting Content Spam on the Web through Text Diversity Analysis

Web spam is considered to be one of the greatest threats to modern search engines. Spammers use a wide range of content generation techniques known as content spam to fill search results with low quality pages. We argue that content spam must be tackled using a wide range of content quality features. In this paper we propose a set of content diversity features based on frequency rank distributi...

متن کامل

A Self-Supervised Approach to Comment Spam Detection Based on Content Analysis

This paper studies the problems and threats posed by a type of spam in the blogosphere, called blog comment spam. It explores the challenges introduced by comment spam, generalizing the analysis substantially to any other short text type spam. The authors analyze different high-level features of spam and legitimate comments based on the content of blog postings. The authors use these features t...

متن کامل

Library blogs and user participation: a survey about comment spam in library blogs

Purpose The purpose of this research is to identify and describe the impact of comment spam in library blogs. Three research questions guided the study: current level of commenting in library blogs; librarians' perception of comment spam; and techniques used to address the comment spam problem. Design/methodology/approach A quantitative approach is used to investigate research questions. Inform...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010